Introduction

TODO

Datasets

Downloaded

The first dataset considered is the Steam Video Games Dataset. This dataset is a list of user behaviors, with columns: user-id, game-title, behavior-name, value. The behaviors included are ‘purchase’ and ‘play’. The value indicates the degree to which the behavior was performed. In the case of ‘purchase’ the value is always 1, and in the case of ‘play’ the value represents the number of hours the user has played the game.

user.id game.title behavior.name value
151603712 The Elder Scrolls V Skyrim purchase 1.0
151603712 The Elder Scrolls V Skyrim play 273.0
151603712 Fallout 4 purchase 1.0
151603712 Fallout 4 play 87.0
151603712 Spore purchase 1.0
151603712 Spore play 14.9

The dataset contains 200k entries relative to over 12k different users and over 5k games. The skewness of the data, equal to 10.74, is evident, with a median of 2 and a mean of 10 game purchases per user. Also the playtime has a high variability between gamers: from users that played for less than 5 minutes, to users with thousands and thousands of hours. The user with the highest number of games has 1552 games with a playtime of 6778 hours. Instead the user that spent the highest number of hours playing spent 11906 hours on 433 different games. Sadly the dataset does not contain the period in which the hours were spent.

Steam games complete dataset is the second dataset used. In this one are listed 40k games, each with a set of information about the genre, the developer, associated tags, description, and others. For the purpose of this assignment we are interested only in a subset of the columns, for example the url to the Steam page is not useful for us. Follows a glimpse of the data.

name release_date genre developer
DOOM May 12, 2016 Action id Software
PLAYERUNKNOWN’S BATTLEGROUNDS Dec 21, 2017 Action,Adventure,Massively Multiplayer PUBG Corporation
BATTLETECH Apr 24, 2018 Action,Adventure,Strategy Harebrained Schemes
DayZ Dec 13, 2018 Action,Adventure,Massively Multiplayer Bohemia Interactive
EVE Online May 6, 2003 Action,Free to Play,Massively Multiplayer,RPG,Strategy CCP
Grand Theft Auto V: Premium Online Edition NaN Action,Adventure Rockstar North

Analyzed

Because I was interested in following the connections between gamers and type of games played I’ve created two sub datasets: users_info.csv is a subset of the first one, while games_info.csv is a subset of the second. They were created joining the initial datasets in a way that, for now on, only games with players will be considered and only player that play games for which we actually have details. Summing up we consider 2k games and 10k users, with over 90k user-game interactions (either “purchase” or “play”). Every users plays at least one of the 2k games and every game has at least a player.

About the users

Type of gamers

TODO tradurre ed estendere

Dal plot possiamo notare che: 1. il clustering ha piu’ senso se osserviamo “default” 2. chi ha piu’ ore non e’ chi ha piu’ giochi 3. chi ha piu’ giochi e’ chi ha speso piu’ soldi 4. la maggior parte dei giocatori ha speso poco e ha giocato poco

What about the highlited gamers